116 research outputs found
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Syntactic and pragmatic completeness is known to be important for turn-taking
prediction, but so far machine learning models of turn-taking have used such
linguistic information in a limited way. In this paper, we introduce TurnGPT, a
transformer-based language model for predicting turn-shifts in spoken dialog.
The model has been trained and evaluated on a variety of written and spoken
dialog datasets. We show that the model outperforms two baselines used in prior
work. We also report on an ablation study, as well as attention and gradient
analyses, which show that the model is able to utilize the dialog context and
pragmatic completeness for turn-taking prediction. Finally, we explore the
model's potential in not only detecting, but also projecting, turn-completions.Comment: Accepted to Findings of ACL: EMNLP 202
A General, Abstract Model of Incremental Dialogue Processing
We present a general model and conceptual framework for specifying architectures for incremental processing in dialogue systems, in particular with respect to the topology of the network of modules that make up the system, the way information flows through this network, how information increments are ‘packaged’, and how these increments are processed by the modules. This model enables the precise specification of incremental systems and hence facilitates detailed comparisons between systems, as well as giving guidance on designing new systems. In particular, the model can serve as a framework for specifying module communication in such systems, as we illustrate with some examples
The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue
There is a surge in interest in the development of open-domain chatbots,
driven by the recent advancements of large language models. The "openness" of
the dialogue is expected to be maximized by providing minimal information to
the users about the common ground they can expect, including the presumed joint
activity. However, evidence suggests that the effect is the opposite. Asking
users to "just chat about anything" results in a very narrow form of dialogue,
which we refer to as the "open-domain paradox". In this position paper, we
explain this paradox through the theory of common ground as the basis for
human-like communication. Furthermore, we question the assumptions behind
open-domain chatbots and identify paths forward for enabling common ground in
human-computer dialogue.Comment: Accepted at SIGDIAL 202
How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation
Open-domain chatbots are supposed to converse freely with humans without
being restricted to a topic, task or domain. However, the boundaries and/or
contents of open-domain conversations are not clear. To clarify the boundaries
of "openness", we conduct two studies: First, we classify the types of "speech
events" encountered in a chatbot evaluation data set (i.e., Meena by Google)
and find that these conversations mainly cover the "small talk" category and
exclude the other speech event categories encountered in real life human-human
communication. Second, we conduct a small-scale pilot study to generate online
conversations covering a wider range of speech event categories between two
humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook).
A human evaluation of these generated conversations indicates a preference for
human-human conversations, since the human-chatbot conversations lack coherence
in most speech event categories. Based on these results, we suggest (a) using
the term "small talk" instead of "open-domain" for the current chatbots which
are not that "open" in terms of conversational abilities yet, and (b) revising
the evaluation methods to test the chatbot conversations against other speech
events
Resolving References in Visually-Grounded Dialogue via Text Generation
Vision-language models (VLMs) have shown to be effective at image retrieval
based on simple text queries, but text-image retrieval based on conversational
input remains a challenge. Consequently, if we want to use VLMs for reference
resolution in visually-grounded dialogue, the discourse processing capabilities
of these models need to be augmented. To address this issue, we propose
fine-tuning a causal large language model (LLM) to generate definite
descriptions that summarize coreferential information found in the linguistic
context of references. We then use a pretrained VLM to identify referents based
on the generated descriptions, zero-shot. We evaluate our approach on a
manually annotated dataset of visually-grounded dialogues and achieve results
that, on average, exceed the performance of the baselines we compare against.
Furthermore, we find that using referent descriptions based on larger context
windows has the potential to yield higher returns.Comment: Published at SIGDIAL 202
Collecting Visually-Grounded Dialogue with A Game Of Sorts
An idealized, though simplistic, view of the referring expression production
and grounding process in (situated) dialogue assumes that a speaker must merely
appropriately specify their expression so that the target referent may be
successfully identified by the addressee. However, referring in conversation is
a collaborative process that cannot be aptly characterized as an exchange of
minimally-specified referring expressions. Concerns have been raised regarding
assumptions made by prior work on visually-grounded dialogue that reveal an
oversimplified view of conversation and the referential process. We address
these concerns by introducing a collaborative image ranking task, a grounded
agreement game we call "A Game Of Sorts". In our game, players are tasked with
reaching agreement on how to rank a set of images given some sorting criterion
through a largely unrestricted, role-symmetric dialogue. By putting emphasis on
the argumentation in this mixed-initiative interaction, we collect discussions
that involve the collaborative referential process. We describe results of a
small-scale data collection experiment with the proposed task. All discussed
materials, which includes the collected data, the codebase, and a containerized
version of the application, are publicly available.Comment: Published at LREC 202
- …